Linear Regression using gradient descent

This project demonstrates the linear regression algorithm.

First, the necessary libraries.

In [30]:
import numpy as np
import matplotlib.pyplot as plt
from numpy import linalg as LA
import plotly
import plotly.graph_objects as go
import random, time
from collections import Counter
plotly.offline.init_notebook_mode() #make plots available in html

It is well known that although linear regression can be solved in the closed form:

$\mathbf{v}=(X^TX)^{-1}X^Ty$ where v is a vector of coefficients, "weights".

it is computationally more efficient to use gradient descent to approximate the minimum v of the cost function, J.

$\mathbf{v}^* = \underset{\mathbf{v}\in\mathbb{R}^{n+1}}{\mathrm{argmin}} J(\mathbf{v})$

where the cost function is $J(\mathbf{v})=\frac{1}{m}(Xv-y)^T(Xv-y)$

and its derivative is $\frac{d}{dv}J(\mathbf{v})=\frac{2}{m}(Xv-y)^TX$

The code implementation of the cost function, its derivative, and gradient descent is as follows.

In [31]:
def J(X,y,v):
    ######################### your code goes here ########################
    m = np.size(X,0)
    return 1/m * (((X@v)-y).T)@((X@v)-y)
    
def DJ(X,y,v):
    ######################### your code goes here ########################
    m = np.size(X,0)
    return (1/m) * X.T @ (X@v - y)


def GD_linreg(X,y,alpha,epsilon,max_iters = 10000): 
    ######################### your code goes here ########################
    m, n = np.shape(X)
    onesvect = np.ones((m,1)) 
    X_hat = np.concatenate((onesvect,X), axis = 1)
    condition = True 
    iters = 0
    costs = [10]
    v = np.zeros((n+1,1)) #note: v should be length of number of features + 1
    while condition and iters < max_iters:
        v = v - (alpha * DJ(X_hat,y,v))
        costs.append(J(X_hat,y,v))
        iters +=1
        if ((iters-1)/100).is_integer():
            print(f'After {iters} steps the cost is {costs[iters]}')
        condition = abs(costs[iters] - costs[iters-1]) > epsilon  
    endcost = len(costs)
    print(f'After {endcost-1} steps the cost is {costs[endcost-1]}')
    return (v,costs)

A random normal distribution of points in 3 dimensions is generated and visualized in the following cell. This will be the data used to test or "fit" our linear regression model.

In [32]:
X = np.random.normal(0, 500, size=(4000,2))
y = np.array([[np.random.normal(3,2)*point[0]+np.random.normal(4,2)*point[1]+np.random.normal(5,5)] for point in X])
Xhat = np.concatenate((np.ones((len(X),1)),X), axis = 1)
#plot
plot_figure = go.Figure(data=[go.Scatter3d(x=Xhat[:,1], y=Xhat[:,2], z=[r[0] for r in y], mode='markers',marker=dict(size=2))])
plotly.offline.iplot(plot_figure)

The following cell runs our model with appropriate parameters and outputs a few instances of the cost value.

In [33]:
alpha = 3*10e-8
epsilon = .01
max_iters = 10000
v,costs = GD_linreg(X,y,alpha,epsilon,max_iters)
After 1 steps the cost is [[7203178.87961317]]
After 101 steps the cost is [[2013890.51361883]]
After 122 steps the cost is [[2013889.18712409]]

By plotting the costs in the next cell we can visualize convergence.

In [34]:
plt.plot(costs)
plt.xlabel('number of iterations')
plt.ylabel('Cost J(v)')
plt.show()

Thus, our resulting hyperplane has been properly fitted to the model.

In [35]:
print("The hyperplane is:")
print(v)
The hyperplane is:
[[4.37103480e-04]
 [3.02309488e+00]
 [3.96222755e+00]]

In the following cell we can visualize how well this hyperplane fits the data.

In [36]:
yy = [r[0] for r in y]
trace = go.Scatter3d(x=Xhat[:,1], y=Xhat[:,2], z=[r[0] for r in y], mode='markers',marker=dict(size=2))
xs,ys = Xhat[:,1],Xhat[:,2]
new_x = np.outer(np.linspace(min(xs), max(xs), 30), np.ones(30))
new_y = np.outer(np.linspace(min(ys), max(ys), 30), np.ones(30)).T
new_z = v[0]+v[1]*new_x + v[2]*new_y
# Configure the layout.
layout = go.Layout(margin={'l': 0, 'r': 0, 'b': 0, 't': 0})
data = [trace,go.Surface(x=new_x, y=new_y, z=new_z, showscale=False, opacity=0.5)]
# Render the plot.
plot_figure = go.Figure(data=data, layout=layout)
plotly.offline.iplot(plot_figure)